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Abstract 

This manuscript presents the concerns around the increasingly common problem of not having readily available or 
useful "gold standard" measurements. This issue is particularly important in critical care where many measurements 
used in decision making are surrogates of what we would truly wish to use. However, the question is broad, 
important and applicable in many other areas. 

In particular, a gold standard measurement often exists, but is not clinically (or ethically in some cases) feasible. The 
question is how does one even begin to develop new measurements or surrogates if one has no gold standard to 
compare with? 

We raise this issue concisely with a specific example from mechanical ventilation, a core bread and butter therapy in 
critical care that is also a leading cause of length of stay and cost of care. Our proposed solution centers around a 
hierarchical validation approach that we believe would ameliorate ethics issues around radiation exposure that make 
current gold standard measures clinically infeasible, and thus provide a pathway to create a (new) gold standard. 
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The gold standard 

Medicine is dominated by measurements. Behind virtually 
every decision lurks one or more measurements of one or 
more physiological or biochemical parameters. Although 
it is sometimes less directly acknowledged, there is the 
obvious significant impact of the quality of the measure- 
ment on the ability to deliver the desired quality of care. 

Thus, for most measurements in medicine, there exists 
one or more "gold standards". While there is debate over 
the definition [1], "gold standard" here simply represents 
the best available measurement of a parameter, even if it 
is not the most cost effective or clinically feasible. These 
gold standards are the metrics to which any new meas- 
urement is compared, as well as the means by which 
potential variability in study outcomes can be assessed. 

Incorporating evidence-based medicine into current 
clinical practice is a main avenue for optimizing care [2], 
However, this goal has proven very difficult in critical 
care medicine, where there are numerous examples of 
clinical trials yielding conflicting results or failing to de- 
liver clear results. It bears repeating that measurements 
are at the heart of the intervention and outcome assess- 
ment in these clinical trials. The practice of critical care, 
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perhaps more than any other specialty, is driven by 
metrics that are surrogates of pathophysiological pro- 
cesses. Diagnostics in critical care are not primarily la- 
boratory based, as they are in oncology or hematology 
for example. Thus, one can test for cell markers in 
types of lymphoma, but there are no gold standard 
diagnostics to quantify pathophysiological processes 
seen in acute respiratory distress syndrome (ARDS) and 
septic shock. Yet, managing these conditions and other 
pathophysiological derangements are the 'bread and 
butter' of critical care. 

Mechanical ventilation (MV) in ARDS is one area for 
which there is no well-accepted approach to care, in 
particular, positive end expiratory pressure (PEEP) selec- 
tion, with several conflicting randomised trial results [3]. 
The main goals of MV are to support patient breathing 
and, where possible, select a PEEP that safely maximises 
recruitment. Thus, while clinical outcome variables, such 
as mortality or length of MV, can be linked to a given 
approach or method, there is no guarantee that the inter- 
vention being tested achieved the recruitment desired [4], 

Fortunately, there is a relatively well-accepted gold 
standard measurement of recruitment, the Computed 
Tomography or CT scan, that is currently the most 
accepted, if not fully proven, metric for titrating care 
[5]. Certainly, one or more CT slices of the lung can 
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clearly show the level and amount of recruitment [6], 
providing clear, direct and effective measurement of 
the immediate clinical outcome and impact of an inter- 
vention. It has been used in animal studies [7,8] and some 
human studies [6] to assess recruitment, although it is not 
effective for repeated dynamic assessments. 

In particular, there is increasing opposition to using, or 
overusing, radiation medical imaging, and CT in particu- 
lar, even in research studies [9,10]. Equally, CT has never 
been very practical, nor cost effective, in clinical practice 
for assessing recruitment. Just moving the patient and 
having to ventilate using a longer breathing circuit can 
significantly impact pulmonary mechanics and patient 
condition. Finally, it is not only impractical to use CT 
to make frequent adjustments of PEEP (up to 4 times 
daily), but it would also expose patients to unacceptably 
high doses of radiation. 

Thus, the field is left without any gold standard that 
can effectively or ethically be used in care or research. 
This latter use is critical, because without a high quality 
gold standard that is also clinically feasible, it is not possible 
to safely or effectively translate animal studies or similar 
results into human studies and improved outcomes. As a 
result, emerging non-invasive approaches, such as elec- 
trical impedance tomography (EIT) [11] and model-based 
methods [12], cannot even be assessed to determine their 
efficacy in humans. This failure leaves these modalities 
stranded without a means of validation in human subjects. 
More specifically, when there is no gold standard direct 
measurement, there is no way to prove its replacement or 
surrogate has equal or effective clinical value. 

A path forward 

What would be useful is a validation roadmap by which 
new methods could be assessed safely and ethically. 
Figure 1 shows a 3-phase validation approach with the 
narrowing pyramid indicating that potential for fewer 
methods to pass each stage. The end goal is validation 
on critically ill cohorts, but only subsequent to proof 
that any new method can: 

• Accurately capture recruitment and recruited 
volume in the heterogeneous ARDS lung. 

• Work effectively in humans. 

The specific phases in Figure 1 are rationalized: 

1. Animal Trials: allow a heterogeneous ARDS lung to 
be induced and for a method to be tested over the 
evolution of the disease state. Several CT slices can 
be taken at each step without concern over radiation 
dose. Several slices enable any method to accurately 
quantify error and error relative to the variability 
across the CT slices. 
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Figure 1 A simple 3-phase validation pathway. 



2. Healthy Human Trials: allow any method to show it 
works in humans. Given recent evidence that 
modest recruitment can be obtained in the healthy 
human lung with small added pressure, this 
validation should be enough to show ability to work 
with human anatomy. In these trials, only a single 
CT slice might be taken to minimise radiation. 
Healthy humans are chosen for this level because 
they can tolerate a small radiation dose with the 
minimum possible risk. 

3. Critically 111 Trials: the final validation on MV 
patients, using only a single CT slice as required as 
final validation of any concept that passes the first 
two phases. 

Thus, using this approach, there are no trials on critic- 
ally ill patients and no significant added risk until a new 
method is proven in animals (no added human risk) and 
healthy humans (no significant added human risk). 

Some final thoughts 

So, where does this situation leave us? Perhaps the gold 
standard does not exist. Perhaps, we should abandon the 
'gold standard; as governments did in the 1930s, for a 
more adaptive method that can provide real-time metrics 
that could be ultimately compared in large well designed 
randomised trials. Or possibly, we should create a consen- 
sus or agreed pathway to creating a true gold standard 
to enable better randomised trials and thus better care. 

Otherwise, these issues leave the field without mea- 
surements that are critical for providing and, especially, 
improving care. It is a general problem, what do you do 
when the value of (your) gold (standard) is zero? 
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